Search CORE

3,078 research outputs found

A New Quartet Tree Heuristic for Hierarchical Clustering

Author: Cilibrasi Rudi
Vitanyi Paul M. B.
Publication venue
Publication date: 01/01/2006
Field of study

We consider the problem of constructing an an optimal-weight tree from the 3*(n choose 4) weighted quartet topologies on n objects, where optimality means that the summed weight of the embedded quartet topologiesis optimal (so it can be the case that the optimal tree embeds all quartets as non-optimal topologies). We present a heuristic for reconstructing the optimal-weight tree, and a canonical manner to derive the quartet-topology weights from a given distance matrix. The method repeatedly transforms a bifurcating tree, with all objects involved as leaves, achieving a monotonic approximation to the exact single globally optimal tree. This contrasts to other heuristic search methods from biological phylogeny, like DNAML or quartet puzzling, which, repeatedly, incrementally construct a solution from a random order of objects, and subsequently add agreement values.Comment: 22 pages, 14 figure

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

The Google Similarity Distance

Author: Cilibrasi Rudi
Vitanyi Paul M. B.
Publication venue
Publication date: 01/01/2007
Field of study

Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of `society' is `database,' and the equivalent of `use' is `way to search the database.' We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts we use the world-wide-web as database, and Google as search engine. The method is also applicable to other search engines and databases. This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the world-wide-web using Google page counts. The world-wide-web is the largest database on earth, and the context information entered by millions of independent users averages out to provide automatic semantics of useful quality. We give applications in hierarchical clustering, classification, and language translation. We give examples to distinguish between colors and numbers, cluster names of paintings by 17th century Dutch masters and names of books by English novelists, the ability to understand emergencies, and primes, and we demonstrate the ability to do a simple automatic English-Spanish translation. Finally, we use the WordNet database as an objective baseline against which to judge the performance of our method. We conduct a massive randomized trial in binary classification using support vector machines to learn categories based on our Google distance, resulting in an a mean agreement of 87% with the expert crafted WordNet categories.Comment: 15 pages, 10 figures; changed some text/figures/notation/part of theorem. Incorporated referees comments. This is the final published version up to some minor changes in the galley proof

arXiv.org e-Print Archive

CiteSeerX

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

Normalized Web Distance and Word Similarity

Author: Cilibrasi Rudi L.
Vitanyi Paul M. B.
Publication venue
Publication date: 01/01/2009
Field of study

There is a great deal of work in cognitive psychology, linguistics, and computer science, about using word (or phrase) frequencies in context in text corpora to develop measures for word similarity or word association, going back to at least the 1960s. The goal of this chapter is to introduce the normalizedis a general way to tap the amorphous low-grade knowledge available for free on the Internet, typed in by local users aiming at personal gratification of diverse objectives, and yet globally achieving what is effectively the largest semantic electronic database in the world. Moreover, this database is available for all by using any search engine that can return aggregate page-count estimates for a large range of search-queries. In the paper introducing the NWD it was called `normalized Google distance (NGD),' but since Google doesn't allow computer searches anymore, we opt for the more neutral and descriptive NWD. web distance (NWD) method to determine similarity between words and phrases. ItComment: Latex, 20 pages, 7 figures, to appear in: Handbook of Natural Language Processing, Second Edition, Nitin Indurkhya and Fred J. Damerau Eds., CRC Press, Taylor and Francis Group, Boca Raton, FL, 2010, ISBN 978-142008592

arXiv.org e-Print Archive

CiteSeerX

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

A Fast Quartet Tree Heuristic for Hierarchical Clustering

Author: Cilibrasi Rudi L.
Vitanyi Paul M. B.
Publication venue
Publication date: 12/09/2014
Field of study

The Minimum Quartet Tree Cost problem is to construct an optimal weight tree from the

3{n \choose 4}

weighted quartet topologies on

n

objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized hill climbing, for approximating the optimal weight tree, given the quartet topology weights. The method repeatedly transforms a dendrogram, with all objects involved as leaves, achieving a monotonic approximation to the exact single globally optimal tree. The problem and the solution heuristic has been extensively used for general hierarchical clustering of nontree-like (non-phylogeny) data in various domains and across domains with heterogeneous data. We also present a greatly improved heuristic, reducing the running time by a factor of order a thousand to ten thousand. All this is implemented and available, as part of the CompLearn package. We compare performance and running time of the original and improved versions with those of UPGMA, BioNJ, and NJ, as implemented in the SplitsTree package on genomic data for which the latter are optimized. Keywords: Data and knowledge visualization, Pattern matching--Clustering--Algorithms/Similarity measures, Hierarchical clustering, Global optimization, Quartet tree, Randomized hill-climbing,Comment: LaTeX, 40 pages, 11 figures; this paper has substantial overlap with arXiv:cs/0606048 in cs.D

arXiv.org e-Print Archive

CiteSeerX

CWI's Institutional Repository

Worrying and rumination are both associated with reduced cognitive control

Author: Beckwe M
De Lissnyder E
De Raedt Rudi
Deroost N
Koster Ernst
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Persistent negative thought is a hallmark feature of both major depressive disorder and generalized anxiety disorder. Despite its clinical significance, little is known about the underlying mechanisms of persistent negative thought. Recent studies suggest that reduced cognitive control might be an explanatory factor. We investigated the association between persistent negative thought and switching between internal representations in working memory, using the internal shift task (IST). The IST was administered to a group of undergraduates, classified as high-ruminators versus low-ruminators, or high-worriers versus low-worriers. Results showed that high-ruminators and high-worriers have more difficulties to switch between internal representations in working memory as opposed to low-ruminators and low-worriers. Importantly, results were only significant when the negative stimuli used in the IST reflected personally relevant worry themes for the participants. The results of this study indicate that rumination and worrying are both associated with reduced cognitive control for verbal information that is personally relevant

Ghent University Academic Bibliography

Normalized Information Distance

Author: Balbach Frank J.
Cilibrasi Rudi L.
Li Ming
Vitanyi Paul M. B.
Publication venue
Publication date: 01/01/2008
Field of study

The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.Comment: 33 pages, 12 figures, pdf, in: Normalized information distance, in: Information Theory and Statistical Learning, Eds. M. Dehmer, F. Emmert-Streib, Springer-Verlag, New-York, To appea

arXiv.org e-Print Archive

CiteSeerX

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Diffusion-Induced Oscillations of Extended Defects

Author: Alexander L. Korzhenevskii
M. Shore
N. N. Bogoliubov
Richard Bausch
Rudi Schmitz
Publication venue: 'American Physical Society (APS)'
Publication date: 15/12/2011
Field of study

From a simple model for the driven motion of a planar interface under the influence of a diffusion field we derive a damped nonlinear oscillator equation for the interface position. Inside an unstable regime, where the damping term is negative, we find limit-cycle solutions, describing an oscillatory propagation of the interface. In case of a growing solidification front this offers a transparent scenario for the formation of solute bands in binary alloys, and, taking into account the Mullins-Sekerka instability, of banded structures

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University

A20 deficiency sensitizes pancreatic beta cells to cytokine-induced apoptosis in vitro but does not influence type 1 diabetes development in vivo

Author: Beyaert Rudi
Cardozo AK
Catrysse Leen
Fukaya M
Meyerovich K
Sze Mozes
van Loo Geert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

SCOPUS: ar.jinfo:eu-repo/semantics/publishe

Ghent University Academic Bibliography

PubMed Central

DI-fusion

Implementing ‘PATRIOT’ As an Integrated Model of Instruction to Rebuild the Culture of Entrepreneurship in Higher Education

Author: Rudi Irwansyah M
Suharsono N
Publication venue: 'Knowledge E'
Publication date: 24/03/2019
Field of study

This research was aimed to find an instructional technology program of Entrepreneurship from theory to practice. Those integrated events were showed through mastering theoretical knowledge acquisition, and then applied in the business firms and completed by action. The research activities start from prototyping four instructional packet programs based on the PATRIOT’s model of instruction, and then offered to students through integrated instruction to increase abilities to conduct the Totally Entrepreneurship actions (

Neliti

KnE Publishing Platform